[f]getc returns the character read or -1 on a read error. 0 is returned if the
end of file is reached. If the current input file, for 'getc', or "file", for
'fgetc', is the standard input file and has not been redirected away from the
keyboard, either function will return zero, 0, for keystrikes which cannot be
represented by an ASCII character. A second call must be made to obtain the
extended ASCII character. If the second call also returns zero, then an end of
file has been reached.
The "for ( i in a )" statement assigns to i the indices of a for all elements
in a. The sequencing of the values of i start with any integer values, low to
high, then any string values following the ASCII collating sequence (the same
sequence as used by the C strcmp functions.
The "while ()", "do ... while ()", "for (;;)", break and continue statements
are as in C. the switch/case statements are as in C, except that that case
labels may be any valid QTAwk expression list. The following logic is followed
in matching case labels to the switch expression value:
if ( case_label_value == regular expression )
switch_value ~~ case_label_value
else switch_value == case_label_value
The next statement stops processing the pattern/action statements, reads in the
next record and restarts the patterns/actions with the first.
The cycle statement will check the current value of CYCLE_COUNT against the
value of MAX_CYCLE and perform the following actions: (CYCLE_COUNT is
initialized to 1 every time a new record in read from the current input file)
1. CYCLE_COUNT > MAX_CYCLE --> perform same action as "next"
statement.
2. CYCLE_COUNT <= MAX_CYCLE --> increment CYCLE_COUNT and restart
pattern/action processing with current values of NF, NR, FNR,
$0 and $i, 1 <= i <= NF
An exit statement will cause the END actions to be performed or if
encountered in an END action will cause termination of the program. The
optional expression is returned as the exit status unless overridden by a
further exit statement in an END action.
An endfile statement simulates an end-of-file on the current input file. Any
remaining pattern/action pairs are skipped and any FINAL actions are executed
before proceesing the next command line argument.
The return statement may be used only in function declarations. It may have an
optional value which is returned as the value of the function. The value of a
function defaults to zero/null string (0/"").
REGULAR EXPRESSIONS
^ matches Beginning of Line as first character of expression
$ matches End of Line as last character of expression
\c matches following (hexadecimal value shown in parenthesis):
\a == bell (alert) ( \x07 )
\b == backspace ( \x08 )
\f == formfeed ( \x0c )
\n == newline ( \x0a )
\r == carriage return ( \x0d )
\s == space ( \x20 )
\t == horizontal tab ( \x09 )
\v == vertical tab ( \x0b )
\c == c [ \\ == \ ]
\ooo == character represented by octal value ooo
1 to 3 octal digits acceptable
\xhh == character represented by hexadecimal value hh
1 or 2 hexadecimal digits acceptable
. matches any character
[abc0-9] Character Class - match any character in class
[^abc0-9] Negated Character Class - match any character not in class
[!abc0-9] Negated Character Class - match any character not in class
[#abc0-9] Matched Character Class - for second match, class character
must match in corresponding position
* - Closure, Zero or more matches
+ - Positive Closure, One or More matches
? - Zero or One matches
r(s)t embedded regular expression s
r|s|t '|' == logical 'or' operator. Expression r or s or t
@ - Look-Ahead, r@t, matches regular expression 'r' only when r is
followed by regular expression 't'. Regular expression t not
contained in final match. Symbol loses special meaning when
contained within parenthesis, '()', or character class, '[]'.
r{n1,n2} - at least n1 and up to n2 repetitions of expression r
n1, n2 integers with 1 <= n1 <= n2
r{2,6} ==> rrr?r?r?r?
r{3,3} ==> rrr
Expressions grouped by ", (), [], or names, "{name}"
repeated as a group: (Note the treatment of quoted expressions)
(r){2,6} ==> (r)(r)(r)?(r)?(r)?(r)?
[r]{2,6} ==> [r][r][r]?[r]?[r]?[r]?
{r}{2,6} ==> {r}{r}{r}?{r}?{r}?{r}?
"r"{2,6} ==> "rr(r)?(r)?(r)?(r)?"
{named_expr} - named expression. In regular expressions "{name}"
is replaced by the value of the corresponding variable. Unrecognized
variable names are not replaced. Names starting with an underscore and
followed by a single upper or lower case letter are reserved as
predefined. The following predefined names are currently available:
{_a} == [A-Za-z] Alphabetic
{_b} == [{}()[]<>] Brackets
{_c} == [\x001-\x01f\x-7f] Control character
{_d} == [0-9] Digit
{_e} == [DdEe][-+]?{_d}{1,3} Exponent
{_f} == [-+]?({_d}+\.{_d}*|{_d}*\.{_d}+) Floating point number
{_g} == {_f}({_e})? float, optional exponent
{_h} == [0-9A-Fa-f] Hex-digit
{_i} == [-+]?{_d}+ Integer
{_n} == [A-Za-z0-9] alpha-Numeric
{_o} == [0-7] Octal digit
{_p} == [\!-/:-@[-`{-\x07f] Punctuation
{_q} == {_s}[\"'] double or single Quote
{_r} == {_f}{_e} Real number
{_s} == (^|[!\\](\\\\)*) zero or even number of Slashes
{_t} == [\s-~] printable character
{_u} == [\!-~] graphical character
{_w} == [\s\t] White space
{_z} == [\t-\r\s] space, \t, \n, \v, \f, \r, \s
PRINTF FORMAT
QTAwk follows the ANSI standard for the C Language for the format string in the
printf and fprintf functions except for the 'P' and 'n' types, which are not
supported and will give unpredictable results.
A format specification has the form:
%[flags][width][.precision][h | l | L]type
which is matched by the following regular expression:
/%{flags}?{width}?{precision}?[hlL]?{type}/
with:
flags = /[-+\s#0]/;
width = /({_d}+|\*)/;
precision = /(\.({_d}+|\*))/;
type = /[diouxXfeEgGcs]/;
Each field of the format specification is a single character or a number
signifying a particular format option. The type character, which appears after
the last optional format field, enclosed in braces '[..]', determines whether
the associated argument is interpreted as a character, a string, or a number.
The simplest format specification contains only the percent sign and a type
character (for example, %s). The optional fields control other aspects of the
formatting, as follows:
flags ==> Control justification of output and printing of signs,
blanks, decimal points, octal and hexadecimal prefixes.
width ==> Control minimum number of characters output.
precision ==> Controls maximum number of characters printed for all or
part of the output field, or minimum number of digits printed
for integer values.
h, l, L ==> Prefixes that determine size of argument expected (this
field is retained only for compatibility to C format strings).
h ==> Used as a prefix with the integer types d, i, o, x, and X to
specify that the argument is short int, or with u to specify a
short unsigned int
l == > Used as a prefix with d, i, o, x, and X types to specify that
the argument is long int, or with u to specify a long unsigned int;
also used as a prefix with e, E, f, g, and G types to specify a
double, rather than a float
L ==> Used as a prefix with e, E, f, g, and G types to specify a long
double
If a percent sign, '%', is followed by a character that has no meaning as a
format field, the character is simply copied to the output. For example, to
print a percent-sign character, use "%%".
Type characters:
d ==> integer, Signed decimal integer
i ==> integer, Signed decimal integer
u ==> integer, Unsigned decimal integer
o ==> integer, Unsigned octal integer
x ==> integer, Unsigned hexadecimal integer, using "abcdef"
X ==> integer, Unsigned hexadecimal integer, using "ABCDEF"
f ==> float, Signed value having the form [-]dddd.dddd, where dddd is
one or more decimal digits. The number of digits before the
decimal point depends on the magnitude of the number, and the
number of digits after the decimal point depends on the requested
precision.
e ==> float, Signed value having the form [-]d.dddd e [sign]ddd, where
d is a single decimal digit, dddd is one or more decimal digits,
ddd is exactly three decimal digits, and sign is + or -.
E ==> float, Identical to the e format, except that E introduces the
exponent instead of e.
g ==> float, Signed value printed in f or e format, whichever is more
compact for the given value and precision. The e format is used
only when the exponent of the value is less than -4 or greater than
the precision argument. Trailing zeros are truncated and the
decimal point appears only if one or more digits follow it.
G ==> float, Identical to the g format, except that G introduces the
exponent (where appropriate) instead of e.
c ==> character, Single character
s ==> string, Characters printed up to the first null character ('\0')
or until the precision value is reached.
Flag Characters
- ==> Left justify the result within the given field width. Default:
Right justify.
+ ==> Prefix the output value with a sign (+ or -) if the output value
is of a signed type. Default: Sign appears only for negative
signed values (-).
blank (' ') ==> Prefix the output value with a blank if the output
value is signed and positive. The blank is ignored if both the
blank and + flags appear. Default: No blank.
# ==> When used with the o, x, or X format, the # flag prefixes any
nonzero output value with 0, 0x, or 0X, respectively. Default: No
blank.
# ==> When used with the e, E, or f format, the # flag forces the
output value to contain a decimal point in all cases. Default:
Decimal point appears only if digits follow it.
# ==> When used with the g or G format, the # flag forces the output
value to contain a decimal point in all cases and prevents the
truncation of trailing zeros. Default: Decimal point appears only
if digits follow it. Trailing zeros are truncated.
# ==> Ignored when used with c, d, i, u or s
0 ==> For d, i, o, u, x, X, e, E, f, g, and G conversions, leading
zeros (following any indication of sign or base) are used to pad to
the field width; no space padding is performed. If the 0 and -
flags both appear, the 0 flag will be ignored. For d, i, o, u, x,
and X conversions, if a precision is specified, the 0 flag will be
ignored. For other conversions the behavior is undefined.
Default: Use blank padding
If the argument corresponding to a floating-point specifier is infinite or
indefinite, the following output is produced:
+ infinity ==> 1.#INFrandom-digits
- infinity ==> -1.#INFrandom-digits
Indefinite ==> digit.#INDrandom-digits
The width argument is a non-negative decimal integer controlling the minimum
number of characters printed. If the number of characters in the output value
is less than the specified width, blanks are added to the left or the right of
the values (depending on whether the - flag is specified) until the minimum
width is reached. If width is prefixed with a 0 flag, zeros are added until
the minimum width is reached (not useful for left-justified numbers).
The width specification never causes a value to be truncated; if the number of
characters in the output value is greater than the specified width, or width is
not given, all characters of the value are printed (subject to the precision
specification).
The width specification may be an asterisk (*), in which case an integer
argument from the argument list supplies the value. The width argument must
precede the value being formatted in the argument list. A nonexistent or small
field width does not cause a truncation of a field; if the result of a
conversion is wider than the field width, the field expands to contain the
conversion result.
The precision specification is a non-negative decimal integer preceded by a
period, '.', which specifies the number of characters to be printed, the number
of decimal places, or the number of significant digits. Unlike the width
specification, the precision can cause truncation of the output value, or
rounding in the case of a floating-point value.
The precision specification may be an asterisk, '*', in which case an integer
argument from the argument list supplies the value. The precision argument
must precede the value being formatted in the argument list.
The interpretation of the precision value, and the default when precision is
omitted, depend on the type, as shown below:
d,i,u,o,x,X ==> The precision specifies the minimum number of digits to be
printed. If the number of digits in the argument is less than precision, the
output value is padded on the left with zeros. The value is not truncated
when the number of digits exceeds precision. Default: If precision is 0 or
omitted entirely, or if the period (.) appears without a number following it,
the precision is set to 1.
e, E ==> The precision specifies the number of digits to be printed after the
decimal point. The last printed digit is rounded. Default: Default
precision is 6; if precision is 0 or the period (.) appears without a number
following it, no decimal point is printed.
f ==> The precision value specifies the number of digits after the decimal
point. If a decimal point appears, at least one digit appears before it. The
value is rounded to the appropriate number of digits. Default: Default
precision is 6; if precision is 0, or if the period (.) appears without a
number following it, no decimal point appears.
g, G ==> The precision specifies the maximum number of significant digits
printed. Default: Six significant digits are printed, without any trailing
zeros that are truncated.
c ==>No effect. Default: Character printed
s ==> The precision specifies the maximum number of characters to be printed.
Characters in excess of precision are not printed. Default: All characters
of the string are printed.
EXAMPLES
Print lines longer than 72 characters (missing action is print):
length($0) > 72
or
length > 72
Print first two fields in opposite order (missing pattern is always match):
{ print $2, $1; }
Add up first column, print sum and average:
{ s += $1; }
END { print "sum is", s, "average is", s/NR }
Print fields in reverse order:
{ for ( i = NF ; i > 0 ; --i ) print $i; }
Print all lines between start/stop pairs:
/start/,/stop/
Print all lines whose first field is different from previous one:
$1 != prev { print; prev = $1; }
Convert date from MM/DD/YY to metric (YYMMDD):
{ n = split(sdate(0),a,"/"); date = a[3] ∩ a[1] ∩ a[2] }
Copy a C program and insert include files:
$1 == "#include" && $2 ~~ /^"/ {
local tfile = $2;
gsub(/"/, "",tfile);
while ( fgetline(tfile,tmp) > 0 ) print tmp;
next;
}
{ print; }
AUTHOR
QTAwk
Utility Creation Program
Version 4.01 07-04-90
(c) Copyright 1988 - 1990 Pearl Boldt. All Rights Reserved.
All warranties as to this software, whether express or implied, are disclaimed,
including without limitation any implied warranties of merchantability, fitness
for a particular purpose, functionality or data integrity or protection are
disclaimed.
You are free to use, copy and give to others QTawk for noncommercial use IF:
=> NO FEE IS CHARGED FOR USE, COPYING OR DISTRIBUTION.
=> ALL FILES AND DOCUMENTATION ACCOMPANY THE PROGRAMS.
=> THEY ARE NOT MODIFIED IN ANY WAY.
If you find QTAwk convenient to use, a registration of $35 would be appreciated. If you send $50 or more you will receive, when available, the current and next version of the QTgrep, QSgrep and flagset programs and documentation.
Pearl Boldt
13012 Birdale Lane
Darnestown, MD 20878
If you find QTAwk convenient to use, a registration of $35 would be
appreciated. If you send $50 or more you will receive the current and, when
available, next version of the QTAwk program and documentation.
Questions may be sent to the address above or to
CompuServe ID: 72040.434
Site licenses are available for QTAwk. Inquires should be addressed to the
copyright holder listed above or the CompuServe ID listed.
All inquires concerning the use and/or distribution of QTAwk, documentation and
accompanying files should be addressed to the copyright holder listed above or